In this worked example, we’ll take a look at the Meuse river data set from the sp package1.
You of course need to download and install the package if you don’t have it already. Then make the data set available using data(meuse).
library(sp)
data(meuse)This map data is just a simple data.frame, but since we will be working with simple features, we need to convert this data set to such. First, however, notice that if you look at the x and y values here, it’s clear that they’re actually not longitude and latitude measurements. By looking at the documentation for meuse, we instead see that they are in fact Rijksdriehoek (RDH) (Netherlands topographical) map coordinates. To everything work out correctly later on, we need to make sure that our simple features data understands this information.
To convert the data into a simple features object, we use the st_as_sf() function, using the coords argument to specify which variables represent our x and y coordinates, and then we use the crs argument to denote which projection system they are measured in. RDH coordinates have the EPSG code 28992. So we use that as input.
library(sf)
meuse_sf <-
meuse %>%
st_as_sf(coords = c("x", "y"), crs = 28992) %>%
st_transform(crs = 4326)Before we dig into this data set, let’s set up the ggplot2 theme for mapping. This time, we use the theme_map() (instead of theme_void(), which we used previously) from the ggthemes package.
library(tidyverse)
library(ggthemes)
theme_set(theme_map(base_size = 11))Let’s start with a simple plot, looking at lead, creating a bubble plot by mapping the presence of lead to size of the points.
ggplot(meuse_sf) +
geom_sf(aes(size = lead)) +
theme(legend.position = "right")Concentration of lead around the river Meuse.
The meuse data set is so called because it’s related to pollution data around the river Meuse, so it makes sense that we actually try to include the location of this river in our plot too.
To do so, we need some map data that represents this river. Our trusty rnaturalearth package yet again comes to the rescue, although this time the data is actually not included directly in the package but must instead be downloaded using the ne_download() function.
Let’s begin by downloading the data.
library(rnaturalearth)
rivers <- ne_download(
scale = 10,
type = "rivers_lake_centerlines",
category = "physical",
returnclass = "sf"
)## OGR data source with driver: ESRI Shapefile
## Source: "/tmp/RtmpnDLdAy", layer: "ne_10m_rivers_lake_centerlines"
## with 1455 features
## It has 34 fields
## Integer64 fields read as strings: rivernum ne_id
rivers contains rivers and lakes from all across the world, which means that if we were to plot it, we would get a lot of information we don’t need. There are two ways around this, but either way we first need to figure out a bounding box that includes the coordinates from the data set, perhaps with a little extra margin. As I hope you’ve started to realize by now, there often is a st_*() function for whatever mapping related issue you can think of. This is the case here to: st_bbox().
bbox <- st_bbox(meuse_sf)
bbox## xmin ymin xmax ymax
## 5.72319 50.95661 5.76304 50.99156
The first and dirtiest method for plotting the data is just to use the coord_sf() function to provide limits using our bbox object. There’s nothing wrong with using this method, and we’ll stick to it here, but we note that a more elegant way might be to use st_crop(), which works directly with our bounding box above.
# Alternative solution: river_meuse <- st_crop(rivers, bbox)
ggplot(meuse_sf) +
geom_sf(aes(size = lead)) +
geom_sf(data = rivers) +
coord_sf(
xlim = c(bbox[["xmin"]], bbox[["xmax"]]),
ylim = c(bbox[["ymin"]], bbox[["ymax"]])
) +
theme(legend.position = "right")Regrettably, this is not quite what we would like to see. The resolution of this vector raster is too low and the river ends up being just a line. Let’s try something different.
What we’ll do instead is to plot a raster map of the river and the surrounding area. We’ll download the map from stamen maps, using ggmap.
library(ggmap)To download a map, we need to call get_stamenmap() and specify a bounding box. Fortunately for us, we already computed the bounding box of the points in the last step, so let’s use that.
# extend the ranges slightly
xrng <- extendrange(c(bbox$xmin, bbox$xmax))
yrng <- extendrange(c(bbox$ymin, bbox$ymax))
meuse_raster <- get_stamenmap(
c(
left = xrng[1],
bottom = yrng[1],
right = xrng[2],
top = yrng[2]
),
zoom = 14,
maptype = "terrain"
)To then build the map, we use the ggmap() function instead of ggplot(). Now we run into a hurdle, however, because ggmap and sf unfortunately does not play so well together2. As it stands, right now, it is therefore better to use the standard map features when working with raster maps from ggmap. To do so, we convert our meuse coordinates from simple features back to a standard data frame.
meuse_df <- cbind(
st_drop_geometry(meuse_sf),
st_coordinates(meuse_sf)
)Now we can use the raster map together with our points, simply overlaying them using the geom_point() function.
ggmap(meuse_raster) +
geom_point(aes(X, Y, size = lead), data = meuse_df) +
theme(legend.position = "right")Let’s make the plot slightly more interesting by including data on the other heavy metals too. One way to do so it to use facets, but for that to work, we need to have our data in a long format, where one column denotes the variable names and another notes their values.
meuse_df_long <-
meuse_df %>%
pivot_longer(
cadmium:zinc,
names_to = "metal",
values_to = "concentration"
)We construct our plot as before, faceting on the type of metal.
ggmap(meuse_raster) +
geom_point(
aes(X, Y, size = concentration),
alpha = 0.5,
data = meuse_df_long
) +
facet_wrap("metal") +
theme(legend.position = "right")Concentration of different heavy metals around the river Meuse in the Netherlands.
Purely for instructional reasons, let’s assume that we were considering some kind of cleanup of this heavy metal waste, but wanted to see how this would affect affect the ruins of the nearby Kasteel Stein by displaying it on the map.
We don’t know the coordinates of the castle off-hand, so let’s find them out by geocoding. In the lecture, we used the facilities of ggmap for geocoding, but unfortunately the only API that is currently supported in ggmap is Google’s API, which is not free.
Another alternative is to use tidygeocoder, which features a host of geocoding alternatives. A couple of these are free, for instance the Nominatim API.3
Start by generating a tibble (or data.frame) with the names and addresses that you want to geocode.
library(tidygeocoder)
addresses <- tribble(
~ name, ~ address,
"Kasteel Stein", "Kasteel Stein, Netherlands"
)Next, pipe these addresses on to tidygeocoder::geocode().4
kasteel_stein <-
addresses %>%
tidygeocoder::geocode(
address, # map to address
method = "osm" # for Nominatim API
)
kasteel_stein## # A tibble: 1 × 4
## name address lat long
## <chr> <chr> <dbl> <dbl>
## 1 Kasteel Stein Kasteel Stein, Netherlands 51.0 5.76
To finish our map, we complement our previous plot with two layers for our new, geocoded data, to show a label and a dot for the castle ruins.
ggmap(meuse_raster) +
geom_point(
aes(X, Y, size = concentration),
alpha = 0.5,
data = meuse_df_long
) +
geom_point(
aes(long, lat),
col = "red",
data = kasteel_stein
) +
geom_label(
aes(long, lat, label = name),
vjust = -0.5,
data = kasteel_stein
) +
facet_wrap("metal") +
theme(legend.position = "right")Concentration of different heavy metals around the river Meuse in the Netherlands.
The source code for this document is available at https://github.com/stat-lu/dataviz/blob/main/worked-example-chile-plebiscite.Rmd.
The sp package is a full-featured solution for visualization data, and can be used in its own right to create maps in R.↩︎
In this case, the problem would actually be hardly noticeable because the map is on a small scale and the problem, which occurs because the bounding box of the raster is in a different projection, results in no noticeable distorsion.↩︎
The API is free, but it is also rate-limited at one request per second and not as powerful as some of the other available APIs. If you have a project where you need more serious geocoding, you should probably get a key for one of the other APIs.↩︎
There is a namespace clash here with ggmap, which has its own ggmap::geocode() function, so we need to specify which function to use with :: here.↩︎